Search CORE

276 research outputs found

Mining a medieval social network by kernel SOM and related methods

Author: Rossi Fabrice
Truong Quoc-Dinh
Villa Nathalie
Publication venue
Publication date: 09/05/2008
Field of study

This paper briefly presents several ways to understand the organization of a large social network (several hundreds of persons). We compare approaches coming from data mining for clustering the vertices of a graph (spectral clustering, self-organizing algorithms. . .) and provide methods for representing the graph from these analysis. All these methods are illustrated on a medieval social network and the way they can help to understand its organization is underlined

arXiv.org e-Print Archive

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

HAL Descartes

HAL-INSA Toulouse

Discrimination de courbes par régression inverse fonctionnelle

Author: Ferré Louis
Villa Nathalie
Publication venue: Société française de statistique
Publication date: 06/12/2004
Field of study

19 pagesNational audienceLes méthodes de régression inverse telles que la SIR (Li,1991) ont été développées dans le domaine de la régression multivariée pour éviter le célèbre fléau de la dimension. Elles ont été récemment étendues aux données fonctionnelles. Plusieurs approches ont été proposées et nous présentons ici un article de synthèse et de comparaison en abordant le cas où la variable réponse est un vecteur d'indicatrice d'appartenance à des classes. Nous montrons qu'alors la régression inverse conduit à une méthode de discrimination dont la pertinence est établie sur des données réelles et simulées

Scientific Publications of the University of Toulouse II Le Mirail

Numérisation de Documents Anciens Mathématiques

Clustering a medieval social network by SOM using a kernel based distance measure

Author: Boulet Romain
Villa Nathalie
Publication venue: M. Verleysen
Publication date: 01/04/2007
Field of study

6 pagesInternational audienceIn order to explore the social organization of a medieval peasant community before the Hundred Years' War, we propose the use of an adaptation of the well-known Kohonen Self Organizing Map to dissimilarity data. In this paper, the algorithm is used with a distance based on a kernel which allows the choice of a smoothing parameter to control the importance of local or global proximities

Scientific Publications of the University of Toulouse II Le Mirail

HAL-INSA Toulouse

Un r\'esultat de consistance pour des SVM fonctionnels par interpolation spline

Author: Rossi Fabrice
Villa Nathalie
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

This Note proposes a new methodology for function classification with Support Vector Machine (SVM). Rather than relying on projection on a truncated Hilbert basis as in our previous work, we use an implicit spline interpolation that allows us to compute SVM on the derivatives of the studied functions. To that end, we propose a kernel defined directly on the discretizations of the observed functions. We show that this method is universally consistent.Comment: 6 page

arXiv.org e-Print Archive

Scientific Publications of the University of Toulouse II Le Mirail

Comptes Rendus Mathématique

INRIA a CCSD electronic archive server

Numérisation de Documents Anciens Mathématiques

Storms prediction : Logistic regression vs random forest for unbalanced data

Author: Ruiz Anne
Villa Nathalie
Publication venue
Publication date: 01/01/2007
Field of study

The aim of this study is to compare two supervised classification methods on a crucial meteorological problem. The data consist of satellite measurements of cloud systems which are to be classified either in convective or non convective systems. Convective cloud systems correspond to lightning and detecting such systems is of main importance for thunderstorm monitoring and warning. Because the problem is highly unbalanced, we consider specific performance criteria and different strategies. This case study can be used in an advanced course of data mining in order to illustrate the use of logistic regression and random forest on a real data set with unbalanced classes

arXiv.org e-Print Archive

Scientific Publications of the University of Toulouse II Le Mirail

Toulouse Capitole Publications

Toulouse 1 Capitole Publications

HAL-INSA Toulouse

Random Forests for Big Data

Author: Genuer Robin
Poggi Jean-Michel
Tuleau-Malot Christine
Villa-Vialaneix Nathalie
Publication venue
Publication date: 19/11/2015
Field of study

Big Data is one of the major challenges of statistical science and has numerous consequences from algorithmic and theoretical viewpoints. Big Data always involve massive data but they also often include online data and data heterogeneity. Recently some statistical methods have been adapted to process Big Data, like linear regression models, clustering methods and bootstrapping schemes. Based on decision trees combined with aggregation and bootstrap ideas, random forests were introduced by Breiman in 2001. They are a powerful nonparametric statistical method allowing to consider in a single and versatile framework regression problems, as well as two-class and multi-class classification problems. Focusing on classification problems, this paper proposes a selective review of available proposals that deal with scaling random forests to Big Data problems. These proposals rely on parallel environments or on online adaptations of random forests. We also describe how related quantities -- such as out-of-bag error and variable importance -- are addressed in these methods. Then, we formulate various remarks for random forests in the Big Data context. Finally, we experiment five variants on two massive datasets (15 and 120 millions of observations), a simulated one as well as real world data. One variant relies on subsampling while three others are related to parallel implementations of random forests and involve either various adaptations of bootstrap to Big Data or to "divide-and-conquer" approaches. The fifth variant relates on online learning of random forests. These numerical experiments lead to highlight the relative performance of the different variants, as well as some of their limitations

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

ProdInra

Hal-Diderot

Neural Networks for Complex Data

Author: Cottrell Marie
Olteanu Madalina
Rossi Fabrice
Rynkiewicz Joseph
Villa-Vialaneix Nathalie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/10/2012
Field of study

Artificial neural networks are simple and efficient machine learning tools. Defined originally in the traditional setting of simple vector data, neural network models have evolved to address more and more difficulties of complex real world problems, ranging from time evolving data to sophisticated data structures such as graphs and functions. This paper summarizes advances on those themes from the last decade, with a focus on results obtained by members of the SAMM team of Universit\'e Paris

arXiv.org e-Print Archive

HAL-Paris1

Analyse de données pour des graphes étiquetés

Author: Laurent Thibault
Villa-Vialaneix Nathalie
Publication venue: HAL CCSD
Publication date: 21/05/2012
Field of study

International audienceNous proposons une méthode de fouille de données pour un graphe dont les sommets sont étiquetés. Deux approches sont décrites et illustrées sur un jeu de données réelles : elles permettent une représentation du graphe qui combine les informations sur sa structure et sur la valeur de ses étiquettes. Cette visualisation peut être utilisée à des fins d'interprétation pour apporter des informations plus nuancées sur la caractérisation des sommets du graphe

HAL Descartes

HAL-Paris1

Hal-Diderot

sexy-rgtk: a package for programming RGtk2 GUI in a user-friendly manner

Author: Leroux Damien
Villa-Vialaneix Nathalie
Publication venue: HAL CCSD
Publication date: 27/06/2013
Field of study

National audienceThere are many di erent ways to program Graphical User Interfaces (GUI) in R. (Lawrence and Verzani, 2012) provides an overview of the available methods, describing ways to program R GUI with RGtk2, qtbase and tcltk. More recently, the package shiny, for building interactive web applications, was also released (the rst version has been published on December, 2012). By automatically indexing all objects and methods available in RGtk2, we developed a method for creating GTK2-based GUI, in a friendlier and more compact manner. Widgets are accessible with simple functions and options, as is more natural for a R language programmer

HAL Descartes

HAL-Paris1

Hal-Diderot

Consistency of Derivative Based Functional Classifiers on Sampled Data

Author: Rossi Fabrice
Villa-Vialaneix Nathalie
Publication venue: HAL CCSD
Publication date: 01/01/2007
Field of study

International audienceIn some applications, especially spectrometric ones, curve classifiers achieve better performances if they work on the

m

-order derivatives of their inputs. This paper proposes a smoothing spline based approach that give a strong theoretical background to this common practice

CiteSeerX

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

HAL-INSA Toulouse